Towards Language Independent NE Identification in the context of Wikipedia
ثبت نشده
چکیده
Named Entity Identification/Recognition is a key component for most Information Extraction tasks. All the existing approaches for NEI use extensive language specific resources. This paper deals with the problem of Multi lingual Named Entity Identification (NEI), and explains the need to address this problem in a language independent fashion. In this work we focus on Less Resourced languages like Hindi, Tamil that does not have prevalent language resources. Inherent structure of Wikipedia articles in multiple languages is exploited for NEI of less resourced languages. Other major contribution of this work is to extend the identification to word level from phrase level, under the intuition that it would increase the coverage of Named Entities. We evaluate our approach on comparable list of NE’s in Hindi, Tamil and English that are manually collected from Named Entity Workshop (NEWS).
منابع مشابه
Advertising Keyword Suggestion Using Relevance-Based Language Models from Wikipedia Rich Articles
When emerging technologies such as Search Engine Marketing (SEM) face tasks that require human level intelligence, it is inevitable to use the knowledge repositories to endow the machine with the breadth of knowledge available to humans. Keyword suggestion for search engine advertising is an important problem for sponsored search and SEM that requires a goldmine repository of knowledge. A recen...
متن کاملNamed Entity Corpus Construction using Wikipedia and DBpedia Ontology
In this paper, we propose a novel method to automatically build a named entity corpus based on the DBpedia ontology. Since most of named entity recognition systems require time and effort consuming annotation tasks as training data. Work on NER has thus for been limited on certain languages like English that are resource-abundant in general. As an alternative, we suggest that the NE corpus gene...
متن کاملNamed Entity Recognition in Persian Text using Deep Learning
Named entities recognition is a fundamental task in the field of natural language processing. It is also known as a subset of information extraction. The process of recognizing named entities aims at finding proper nouns in the text and classifying them into predetermined classes such as names of people, organizations, and places. In this paper, we propose a named entity recognizer which benefi...
متن کاملWhy Figurative Language: Perceived Discourse Goals for Metaphors and Similes by L2 Learners
The goal of this study was to investigate the kinds of discourse goals that Iranian EFL learners perceive as the most probable reasons behind the utterance of figurative language, metaphors and similes, with reference to 4 independent variables of Figure Type (Metaphor or Simile), Tenor Concreteness (Concrete or Abstract), Context (List Format or Story), and Modality (Oral, Written, and Both). ...
متن کاملMining Transliterations from Wikipedia using Dynamic Bayesian Networks
Transliteration mining is aimed at building high quality multi-lingual named entity (NE) lexicons for improving performance in various Natural Language Processing (NLP) tasks including Machine Translation (MT) and Cross Language Information Retrieval (CLIR). In this paper, we apply two Dynamic Bayesian network (DBN)-based edit distance (ED) approaches in mining transliteration pairs from Wikipe...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010